153 research outputs found
Personalized Fuzzy Text Search Using Interest Prediction and Word Vectorization
In this paper we study the personalized text search problem. The keyword
based search method in conventional algorithms has a low efficiency in
understanding users' intention since the semantic meaning, user profile, user
interests are not always considered. Firstly, we propose a novel text search
algorithm using a inverse filtering mechanism that is very efficient for label
based item search. Secondly, we adopt the Bayesian network to implement the
user interest prediction for an improved personalized search. According to user
input, it searches the related items using keyword information, predicted user
interest. Thirdly, the word vectorization is used to discover potential targets
according to the semantic meaning. Experimental results show that the proposed
search engine has an improved efficiency and accuracy and it can operate on
embedded devices with very limited computational resources
Diversity and distribution of physical dormant species in relation to ecosystem and life-forms
Impermeable seed/fruit coat, i.e. physical dormancy (PY) occurring only in several genera of 18 angiosperm families plays an important role in controlling seed persistence and germination timing. It has been theoretically speculated that PY is more prevalent in drylands than in moist vegetation zones, but unequivocal support for this assertion is currently unavailable. The broad objective of this contribution was to examine the distribution of PY on the various vegetation of tropics and temperate ecosystems using a data set of 13, 792 species. The number of species with PY in tropics (19%) is higher than the number of PY species in the temperate ecosystem (15%). However, in both tropics and temperate, there is a clear trend that PY is less common in moist and low-temperature vegetation zones compared with dry and high-temperature vegetation. In tropics, PY is more prevalent in dry woodlands (33%) and tropical deciduous forests (27.3%) compared with the evergreen rain forest (9%). Similarly, in the temperate zone, dry vegetation with seasonal rainfall such as Matorral (22.3) and deserts (19.5%) have a higher number of PY species compared with moist warm woodlands (8.1%) and deciduous forest (9%). Although PY is a trait found in various life-forms, it appears to be less common in trees, particularly of the temperate zone. We discuss the ecological adaptation of PY in the dry ecosystem and consider the mechanism of persistence and dormancy break in PY and physiological dormant (PD) species
ValueNet: A New Dataset for Human Value Driven Dialogue System
Building a socially intelligent agent involves many challenges, one of which
is to teach the agent to speak guided by its value like a human. However,
value-driven chatbots are still understudied in the area of dialogue systems.
Most existing datasets focus on commonsense reasoning or social norm modeling.
In this work, we present a new large-scale human value dataset called ValueNet,
which contains human attitudes on 21,374 text scenarios. The dataset is
organized in ten dimensions that conform to the basic human value theory in
intercultural research. We further develop a Transformer-based value regression
model on ValueNet to learn the utility distribution. Comprehensive empirical
results show that the learned value model could benefit a wide range of
dialogue tasks. For example, by teaching a generative agent with reinforcement
learning and the rewards from the value model, our method attains
state-of-the-art performance on the personalized dialog generation dataset:
Persona-Chat. With values as additional features, existing emotion recognition
models enable capturing rich human emotions in the context, which further
improves the empathetic response generation performance in the
EmpatheticDialogues dataset. To the best of our knowledge, ValueNet is the
first large-scale text dataset for human value modeling, and we are the first
one trying to incorporate a value model into emotionally intelligent dialogue
systems. The dataset is available at https://liang-qiu.github.io/ValueNet/.Comment: Paper accepted by AAAI 202
TextDiff: Mask-Guided Residual Diffusion Models for Scene Text Image Super-Resolution
The goal of scene text image super-resolution is to reconstruct
high-resolution text-line images from unrecognizable low-resolution inputs. The
existing methods relying on the optimization of pixel-level loss tend to yield
text edges that exhibit a notable degree of blurring, thereby exerting a
substantial impact on both the readability and recognizability of the text. To
address these issues, we propose TextDiff, the first diffusion-based framework
tailored for scene text image super-resolution. It contains two modules: the
Text Enhancement Module (TEM) and the Mask-Guided Residual Diffusion Module
(MRD). The TEM generates an initial deblurred text image and a mask that
encodes the spatial location of the text. The MRD is responsible for
effectively sharpening the text edge by modeling the residuals between the
ground-truth images and the initial deblurred images. Extensive experiments
demonstrate that our TextDiff achieves state-of-the-art (SOTA) performance on
public benchmark datasets and can improve the readability of scene text images.
Moreover, our proposed MRD module is plug-and-play that effectively sharpens
the text edges produced by SOTA methods. This enhancement not only improves the
readability and recognizability of the results generated by SOTA methods but
also does not require any additional joint training. Available
Codes:https://github.com/Lenubolim/TextDiff
The Trickle-down Impact of Reward (In-)consistency on RLHF
Standard practice within Reinforcement Learning from Human Feedback (RLHF)
involves optimizing against a Reward Model (RM), which itself is trained to
reflect human preferences for desirable generations. A notable subject that is
understudied is the (in-)consistency of RMs -- whether they can recognize the
semantic changes to different prompts and appropriately adapt their reward
assignments -- and their impact on the downstream RLHF model.
In this paper, we visit a series of research questions relevant to RM
inconsistency: (1) How can we measure the consistency of reward models? (2) How
consistent are the existing RMs and how can we improve them? (3) In what ways
does reward inconsistency influence the chatbots resulting from the RLHF model
training?
We propose Contrast Instructions -- a benchmarking strategy for the
consistency of RM. Each example in Contrast Instructions features a pair of
lexically similar instructions with different ground truth responses. A
consistent RM is expected to rank the corresponding instruction and response
higher than other combinations. We observe that current RMs trained with the
standard ranking objective fail miserably on Contrast Instructions compared to
average humans. To show that RM consistency can be improved efficiently without
using extra training budget, we propose two techniques ConvexDA and
RewardFusion, which enhance reward consistency through extrapolation during the
RM training and inference stage, respectively. We show that RLHF models trained
with a more consistent RM yield more useful responses, suggesting that reward
inconsistency exhibits a trickle-down effect on the downstream RLHF process
SocAoG: Incremental Graph Parsing for Social Relation Inference in Dialogues
Inferring social relations from dialogues is vital for building emotionally
intelligent robots to interpret human language better and act accordingly. We
model the social network as an And-or Graph, named SocAoG, for the consistency
of relations among a group and leveraging attributes as inference cues.
Moreover, we formulate a sequential structure prediction task, and propose an
-- strategy to incrementally parse SocAoG for the
dynamic inference upon any incoming utterance: (i) an process
predicting attributes and relations conditioned on the semantics of dialogues,
(ii) a process updating the social relations based on related
attributes, and (iii) a process updating individual's attributes based
on interpersonal social relations. Empirical results on DialogRE and MovieGraph
show that our model infers social relations more accurately than the
state-of-the-art methods. Moreover, the ablation study shows the three
processes complement each other, and the case study demonstrates the dynamic
relational inference.Comment: Long paper (oral) accepted by ACL-IJCNLP 202
Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models
Large language models (LLMs) have achieved remarkable progress in various
natural language processing tasks with emergent abilities. However, they face
inherent limitations, such as an inability to access up-to-date information,
utilize external tools, or perform precise mathematical reasoning. In this
paper, we introduce Chameleon, a plug-and-play compositional reasoning
framework that augments LLMs to help address these challenges. Chameleon
synthesizes programs to compose various tools, including LLM models,
off-the-shelf vision models, web search engines, Python functions, and
rule-based modules tailored to user interests. Built on top of an LLM as a
natural language planner, Chameleon infers the appropriate sequence of tools to
compose and execute in order to generate a final response. We showcase the
adaptability and effectiveness of Chameleon on two tasks: ScienceQA and TabMWP.
Notably, Chameleon with GPT-4 achieves an 86.54% accuracy on ScienceQA,
significantly improving upon the best published few-shot model by 11.37%; using
GPT-4 as the underlying LLM, Chameleon achieves a 17.8% increase over the
state-of-the-art model, leading to a 98.78% overall accuracy on TabMWP. Further
studies suggest that using GPT-4 as a planner exhibits more consistent and
rational tool selection and is able to infer potential constraints given the
instructions, compared to other LLMs like ChatGPT.Comment: 25 pages, 10 figures. Project page: https://chameleon-llm.github.i
- …